High-Level Music Descriptor Extraction Algorithm Based on Combination of Multi-Channel CNNs and LSTM
نویسندگان
چکیده
Although Convolutional Neural Networks (CNNs) and Long Short Term Memory (LSTM) have yielded impressive performances in a variety of Music Information Retrieval (MIR) tasks, the complementarity among the CNNs of different architectures and that between CNNs and LSTM are seldom considered. In this paper, multichannel CNNs with different architectures and LSTM are combined into one unified architecture (Multi-Channel Convolutional LSTM, MCCLSTM) to extract high-level music descriptors. First, three channels of CNNs with different shapes of filter are applied on each spectrogram image chunk to extract the pitch-, tempo-, and bassrelevant descriptors, respectively. Then, the outputs of each CNNs channel are concatenated and then passed through a fully connected layer to obtain the fused descriptor. Finally, LSTM is applied on the fused descriptor sequence of the whole track to extract its long-term structure property to obtain the high-level descriptor. To prove the efficiency of the MCCLSTM model, the obtained high-level music descriptor is applied to the music genre classification and emotion prediction task. Experimental results demonstrate that, when compared with the hand-crafted schemes or conventional deep learning (Multi Layer Perceptrons (MLP), CNNs, and LSTM) based ones, MCCLSTM achieves higher prediction accuracy on three music collections with different kinds of semantic tags.
منابع مشابه
Classification of Iranian Traditional Music Dastgahs Using Features Based on Pitch Frequency
The Iranian traditional music is composed of seven majors Dastgahs: Chahargah, Homayoun, Mahour, Segah, Shour, Nava, and Rast-Panjgah. In this paper, a new algorithm for the classification of the Iranian traditional music Dastgahs based on pitch frequency is proposed. In this algorithm, the features of Lagrange coefficients of pitch logarithm (LCPL), Fuzzy similarity sets type 2 (FSST2), and th...
متن کاملFrom Sound to ‘Sense’ via Feature Extraction and Machine Learning: Deriving High-Level Descriptors for Characterising Music
Research in intelligent music processing is experiencing an enormous boost these days due to the emergence of the new application and research field of Music Information Retrieval (MIR). The rapid growth of digital music collections and the concomitant shift of the music market towards digital music distribution urgently call for intelligent computational support in the automated handling of la...
متن کاملRecognition of Visual Events using Spatio-Temporal Information of the Video Signal
Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...
متن کاملA Novel Multicast Tree Construction Algorithm for Multi-Radio Multi-Channel Wireless Mesh Networks
Many appealing multicast services such as on-demand TV, teleconference, online games and etc. can benefit from high available bandwidth in multi-radio multi-channel wireless mesh networks. When multiple simultaneous transmissions use a similar channel to transmit data packets, network performance degrades to a large extant. Designing a good multicast tree to route data packets could enhance the...
متن کاملPseudo Zernike Moment-based Multi-frame Super Resolution
The goal of multi-frame Super Resolution (SR) is to fuse multiple Low Resolution (LR) images to produce one High Resolution (HR) image. The major challenge of classic SR approaches is accurate motion estimation between the frames. To handle this challenge, fuzzy motion estimation method has been proposed that replaces value of each pixel using the weighted averaging all its neighboring pixels i...
متن کامل